DNA sequence analysis

ABSTRACT

There is disclosed a method for determining the identity of one or more mutations or single nucleotide polymorphisms (SNPs) in a genome, comprising: a contacting a sample genome, under conditions which permit template dependant oligonucleotide ligation, with a plurality of different oligonucleotide molecules which comprise (i) a first set of oligonucleotides each comprising a sequence of nucleotides that is complementary to a region on said genome that includes a known SNP site and which oligonucleotides are complementary to said region other than at a base at or near the 5′ end of said oligonucleotides that is to be tested for complementarity to a base at the SNP site, each of said oligonucleotides comprising a unique label to identify both the base to be tested and the position of the SNP to be scored, (ii) a second set of oligonucleotides each comprising a sequence of nucleotides complementary to a region on said target genome for hybridisation with said target genome adjacent the 5′ end of an oligonucleotide of said first oligonucleotide set, and a surface capture moiety, a phosphate moiety being located at any of either the 5′ end of said first set of oligonucleotides or the 3′ end of said second set of oligonucleotides, any resulting ligated oligonucleotide being immobilised on a solid support via the surface capture moiety, b. analysing said solid support for the identity of one or more of said unique labels and comparing the defined bases in any of said immobilised oligonucleotides to those of the reference one or more SNPs.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a National Stage Application claiming thepriority of co-pending PCT Application No. PCT/GB2004/000770 filed Feb.26, 2004, which in turn, claims priority from Great Britain ApplicationSerial No. 0304371.8, filed Feb. 26, 2003. Applicants claim the benefitsof 35 U.S.C. § 120 as to the PCT application and priority under 35U.S.C. § 119 as to the said Great Britain application, and the entiredisclosures of both applications are incorporated herein by reference intheir entireties.

FIELD OF THE INVENTION

This invention relates to a method for detecting variations in thesequences of nucleic acid fragments, particularly in the DNA sequencesof genes in a sample obtained from a patient.

BACKGROUND OF THE INVENTION

Recently, the Human Genome Project determined the entire sequence of thehuman genome—all 3×10⁹ bases. The sequence information represents thatof an average human. However, there is still considerable interest inidentifying differences in the genetic sequence between differentindividuals. The most common form of genetic variation are singlenucleotide polymorphisms (SNPs). On average one base in 1000 is a SNP,which means that there are 3 million SNPs for any individual. Some ofthe SNPs are in coding regions and produce proteins with differentbinding affinities or properties. Some are in regulatory regions andresult in a different response to changes in levels of metabolites ormessengers. SNPs are also found in non-coding regions, and these arealso important as they may correlate with SNPs in coding or regulatoryregions. The key problem is to develop a low cost way of determining oneor more of the SNPs for an individual.

Nucleic acid arrays have been used to determine SNPs, usually in thecontext of monitoring hybridisation events (Mirzabekov, Trends inBiotechnology (1994) 12:27-32). Many of these hybridisation events aredetected using fluorescent labels attached to nucleotides, the labelsbeing detected using a sensitive fluorescent detector, e.g. acharge-coupled detector (CCD). The major disadvantage of these methodsis that repeat sequences can lead to ambiguity in the results. Thisproblem is recognised in Automation Technologies for GenomeCharacterisation, Wiley-Interscience (1997), ed. T. J. Beugelsdijk,Chapter 10: 205-225.

Other analysis methods require the sequencing of genomic fragments usinghigh-density polynucleotide arrays. The use of high-density arrays in amulti-step analysis procedure can lead to problems with phasing. Phasingproblems result from a loss in the synchronisation of a reaction stepoccurring on different molecules of the array. If some of the arrayedmolecules fail to undergo a step in the procedure, subsequent resultsobtained for these molecules will no longer be in step with resultsobtained for the other arrayed molecules. The proportion of moleculesout of phase will increase through successive steps and consequently theresults detected will become ambiguous. This problem is recognised inthe sequencing procedure described in U.S. Pat. No. 5,302,509.

An alternative sequencing approach is disclosed in EP-A-038 1693, whichcomprises hybridising a fluorescently labelled strand of DNA to a targetDNA sample suspended in a flowing sample stream, and then using anexonuclease to cleave repeatedly the end base from the hybridised DNA.The cleaved bases are detected in sequential passage through a detector,allowing reconstruction of the base sequence of the DNA. Each of thedifferent nucleotides has a distinct fluorescent label attached, whichis detected by laser-induced fluorescence. This is a complex method,primarily because it is difficult to ensure that every nucleotide of theDNA strand is labelled and that this has been achieved with highfidelity to the original sequence.

SUMMARY OF THE INVENTION

The present invention is based on the realisation that the informationprovided by the Human Genome Sequencing Project can be used to designspecific oligonucleotides that can be used to hybridise over a putativeSNP site and in the presence of the correct DNA sequence undergo atemplate dependant ligation event. In the event that a first of theoligonucleotides includes at or near its 5′ end a base complementary tothe base at a SNP site, optionally with an associated phosphate, and thesecond oligonucleotide incorporates a surface capture moiety, in thepresence of a DNA ligase these oligonucleotides may be joined to form asingle ligated oligonucleotide that includes the surface capture moiety.The first oligonucleotide incorporates a unique label which may be inthe form of a sequence code, identifying both the base near or at its 5′end and the position of the SNP to be scored in the genome, andtherefore only those oligonucleotide molecules that have undergone theligase reaction will be immobilised on a solid surface with thecorresponding code. Analysing the label allows scoring andidentification of the SNP site, which can then be compared with areference sequence. A phosphate group is required for ligation of theoligonucleotides. This may also be located on the 3′-end of the secondoligonucleotide rather than the 5′-end of the first. Furthermore, a nonenzymatic chemical ligation may be used such as 5′-iodide with3′-selenophosphate, within the context of the invention. Multipleoligonucleotides can be used in one experiment. This obviates the needto sequence the entire genome to identify multiple SNP sites, leading toa reduction in costs and processing time.

Therefore, according to a first aspect of the invention there isprovided a method for determining the identity of one or more mutationsor single nucleotide polymorphisms (SNPs) in a genome, comprising:

-   a. contacting a sample genome, under conditions which permit    template dependant oligonucleotide ligation, with a plurality of    different oligonucleotide molecules which comprise    -   (i) a first set of oligonucleotides each comprising a sequence        of nucleotides that is complementary to a region on said genome        that includes a known SNP site and which oligonucleotides are        complementary to said region other than at a base at or near the        5′ end of said oligonucleotides in said first oligonucleotide        set that is to be tested for complementarity to a base at the        SNP site, each of said oligonucleotides comprising a unique        label to identify both the base to be tested and the position of        the SNP to be scored,    -   (ii) a second set of oligonucleotides each comprising a sequence        of nucleotides complementary to a region on said target genome        for hybridisation with said target genome adjacent the 5′ end of        an oligonucleotide of said first oligonucleotide set, and a        surface capture moiety,        -   a phosphate moiety being located at any of either the 5′ end            of said first set of oligonucleotides or the 3′ end of said            second set of oligonucleotides,        -   any resulting ligated oligonucleotide being immobilised on a            solid support via the surface capture moiety,-   b. analysing said solid support for the identity of one or more of    said unique labels and comparing the defined bases in any of said    immobilised oligonucleotides to those of the reference one or more    SNPs.

The method is particularly advantageous because only those firstoligonucleotide molecules that incorporate a base that is complementaryto the SNP in the sample genome, will undergo the ligation reaction withthe second oligonucleotide incorporating the surface capture moiety forsubsequent immobilisation on the solid surface. Accordingly, any ligatedsequences immobilised on the solid surface may be analysed and comparedto a reference sequence to establish whether the sample genome is thesame as the reference sequence at the SNP. The method also renders itpossible to establish if the SNP is homozygous or heterozygous for anySNP, the label being specific both for the base at the SNP and theposition of the SNP in the genome.

FIG. 1 is a schematic view of a preferred embodiment of the methodaccording to the invention.

DESCRIPTION OF THE INVENTION

The present invention relates to a method that can be used to identifySNP sites and particularly multiple SNP sites, in a target genome. Thepresent invention is, therefore, useful to determine whether a subjecthas a particular SNP, and therefore a risk of disease. Many cancers arecaused by genetic mutation on particular genes, for example a singlemutation is implicated in breast cancer. The methods of the presentinvention can be used to screen for a wide variety of mutations thathave been implicated in disease. The ability to screen for multiple(e.g. thousands) potential SNPs in a single experiment is therefore ofgreat benefit.

The method relies on the ability to utilise the information provided bygenome sequencing efforts, such as the Human Genome Project, to comparegenomic sequences in a sample with a reference or wild-type sequence, toidentify any aberrations. SNP sites are known and it is possible to usethis information to design oligonucleotide molecules that arecomplementary to sequences on the genome immediately adjacent andoverlapping the SNP site. The method of the invention relies on thedesign of two specific sets of oligonucleotides provided forhybridization on the sample genome overlapping the SNP and which undergoa ligation reaction only if the fully complementary sequence is presentin the genomic sample. The first oligonucleotide set according to theinvention, preferably includes a region complementary to the targetgenome up but not including the specific SNP site itself. Any resultingligated oligonucleotide incorporates a unique label, such as for examplea unique sequence, that can then be analysed following itsimmobilization on a solid surface. The unique label identifies both theSNP of the reference sequence and the position of the SNP in the genome.The immobilization occurs via a surface attachment moiety present on theoligonucleotide immediately 3′ to the SNP which oligonucleotide does notinclude the SNP information.

The unique label or sequence code on the first oligonucleotide set isspecific for both the defined base at or near the 5′ end of the firstoligonucleotide to be tested for complementarity with the base at theSNP site and the position of the SNP in question and renders it possibleto identify multiple SNPs in the target genome. In the embodiment wherethe unique label comprises a unique sequence of nucleotides on the firstoligonucleotide set, cycles of sequencing reactions may be carried outto identify the particular code on any oligonucleotide that isimmobilised on the surface of the solid support. In this regard, it isparticularly beneficial for the first oligonucleotide type to comprise aself priming hairpin oligonucleotide that forms a hairpin loop structurein which case only those hairpins immobilized on the solid surface willbe able to undergo the sequencing reaction,

The term “hairpin loop structure” refers to a molecular stem and loopstructure formed from the hybridization of complementary polynucleotidesthat are covalently linked. The stem comprises the hybridizedpolynucleotides and the loop is the region that covalently links the twocomplementary polynucleotides. Anything from a 5 to 20 (or more) basepair double strand nucleic acid may be used to form the stem. In oneembodiment, the structure may be formed from single-strandedpolynucleotide complementary regions. The loop in this embodiment may beanything from two or more non-hybridised nucleotides. In a secondembodiment, the structure is formed from two separate polynucleotideswith complementary regions, the two polynucleotides being linked and theloop being at least partially formed from a linker moiety. The linkermoiety forms a covalent attachment between the ends of the twopolynucleotides. Linker moieties suitable for use in this embodimentwill be apparent to the skilled practitioner. For example, the linkermoiety may be polyethylene glycol (PEG).

In a preferred embodiment, the unique sequence code is located on theoligonucleotide to be proximal the hairpin so that only limitedsequencing is required to sequence the code and thus establish both theidentity of the defined base and the position of the SNP. It isparticularly beneficial in the performance of the invention that thesequencing reaction is carried out on the known sequence of theoligonucleotide incorporating the hairpin because the order of the basescan be controlled so that the number of bases sequenced can beoptimized. In a preferred embodiment, the assay will be multiplexed toidentify thousands of SNPs.

The first oligonucleotide set therefore, may preferably, comprise ahairpin loop structure having a 5′ phosphate, a defined base at or near,but preferably at the 5′ end complementary to one of the heterozygotesunder investigation and about 20 bases of sequence complementary to thegenomic sample on the 5′ side of the SNP together with the uniquesequence code that identifies both the 20 bases of sequence, within thecontext of the other hairpins used in multiplexing, and the identity ofthe 5′ base, and a self complementary loop that allows the free 3′hydroxyl to base pair adjacent to the start of the code.

The second oligonucleotide on the other hand may be a single strandedsequence of about 20 bases from the 3′ side of the SNP complementary tothe genomic sequence. The 3′ hydroxyl end of this strand will becomplementary to the genomic sample immediately adjacent to the 5′-endof the first oligonucleotide type. Because this sequence does not varywith heterozygosity only one such sequence type is required per assay.The 5′ end of this oligonucleotide type contains a modification to allowsurface attachment, for example biotin or thiophosphate. A variation onthis method would be to locate the phosphate group on the 3′-hydroxyl ofthe second oligonucleotide instead of the 5′-position of the firstoligonucleotide.

To establish heterozygosity of 1000 SNPs from a patient sample willrequire up to 5000 pieces of DNA, a hairpin oligonucleotide for eachdefined base at or near its 5′ end and an oligonucleotide of the secondtype incorporating the surface capture moiety. The oligonucleotides areprepared prior to the assay and added to the sample genomic DNA alongwith a DNA ligase, such as T4 DNA ligase. The ligation reaction isallowed to proceed in solution then terminated by washing over asuitable capture surface such as a streptavidin slide. The sample isdiluted prior to capture to allow formation of a single molecule array.Depending on heterozygosity, this process should give up to 2000hairpins captured many times over the surface. Cycles of sequencing bysynthesis may then be performed on any captured hairpins. The sequencingis performed to establish the identity of the codes. 1000 hairpinsrequire 5 cycles (4⁵) and a further cycle to call the SNP, i.e. theidentity of the 5′ base of the preligated hairpin.

The sample genomic DNA may be obtained by methods known in the art. Inone embodiment, the genomic DNA may be fragmented prior to hybridizationand ligation of the oligonucleotide molecules. Fragmentation may becarried out by any suitable method, including restriction enzymedigestion and/or the use of shear forces.

The oligonucleotides are preferably brought into contact with thefragments in solution under ligation conditions, so that duplexformation occurs between complementary oligonucleotide sequences andgenomic fragments. Ligation conditions are known in the art and suitablebuffers, salt concentrations, temperatures etc will all be apparent tothe skilled person. After the ligation step, the resulting duplexes areimmobilised onto a solid support.

Immobilisation of the ligated oligonucleotide molecule to the surface ofa solid support may be carried out by techniques known in the art toform an array, which in one embodiment, as set out in more detail below,may provide adequate separation for individual resolution of the hairpinoligonucleotides. In the context of the present invention, an arrayrefers to a population of polynucleotide molecules distributed over thesolid support. Generally, the array is produced by dispensing smallvolumes of a sample to generate a random single molecule array. In thismanner, a mixture of different molecules may be arrayed by simple meansto produce a single molecule array. In this embodiment, both ligated andnon-ligated oligonucleotides will be immobilised onto the solid support.However, those fragments that are not ligated to the hairpinoligonucleotide will not undergo the sequencing reaction and so will notgenerate a detectable signal.

The preligated oligonucleotide molecules contain a surface capture groupthat permits attachment to a complementary moiety on the surface of thesolid support. This may be achieved by various techniques including,preferably, the incorporation of a nucleotide onto the 5′ end of thesecond oligonucleotide type, the nucleotide being modified with a linkermolecule that reacts with a suitably prepared solid support. Themodified nucleotide can be incorporated onto the oligonucleotide in aconventional way using DNA synthesis or enzymatically using a terminaltransferase. This incorporation step is carried out prior to theligation step with the genomic sample.

Solid supports suitable for use in the invention are availablecommercially, and will be apparent to the skilled person. The supportsmay be manufactured from materials such as glass, ceramics, silica andsilicon. The supports usually comprise a flat (planar) surface. Anysuitable size may be used. For example, the supports might be of theorder of 1 to 10 cm in each direction.

Immobilisation may be by specific covalent or non-covalent interactions.The oligonucleotide can be attached to the solid support at any positionalong its length, the attachment acting to tether the polynucleotide tothe solid support. The immobilised oligonucleotide is then able toundergo the sequencing reaction. Immobilisation in this manner resultsin well separated single hairpin oligonucleotides.

After immobilisation, the incorporation of bases onto the hairpin selfpriming sequence can be determined, and this information used toidentify the SNP present. Conventional assays, which rely on thedetection of fluorescent labels attached to the bases, can be used toobtain the information on the SNP. These assays rely on the stepwiseidentification of suitably labelled bases, referred to as “single base”sequencing methods. The bases are incorporated onto the primer sequenceusing the polymerase reaction.

In an embodiment of the invention, the incorporation of bases isdetermined using fluorescently labelled nucleotides. The nascent chain(on the primer) is extended in a stepwise manner by the polymerasereaction. Each of the different nucleotides (A, T, G and C) incorporatesa unique fluorophore and a group blocking the 3′ position to preventuncontrolled polymerisation. As used herein, the term “blocking group”refers to a moiety attached to a nucleotide which, while not interferingsubstantially with template-dependent enzymatic incorporation of thenucleotide into a polynucleotide chain, abrogates the ability of theincorporated nucleotide to serve as a substrate for further nucleotideaddition. A “removable blocking group” is a blocking group that can beremoved by a specific treatment that results in the cleavage of thecovalent bond between the nucleotide and the blocking group. Specifictreatments can be, for example, a photochemical, chemical or enzymatictreatment that results in the cleavage of the covalent bond between thenucleotide and the block. Removal of the blocking group will restore theability of the incorporated, formerly blocked nucleotide to serve as asubstrate for further enzymatic nucleotide additions.

The polymerase enzyme incorporates a nucleotide into the nascent chaincomplementary to the sequence on the hairpin oligonucleotide and theblocking group prevents further incorporation of nucleotides.Unincorporated nucleotides are removed and each incorporated nucleotideis “read” optically by a charge-coupled detector using laser excitationand filters. The 3′-blocking group is then removed (deprotected), toexpose the nascent chain for further nucleotide incorporation. Oneadvantage in the use of pre-designed sequencing codes is that thesequence can be set to contain no identical contiguous bases andtherefore cycles of sequencing can optionally be performed using onlyone non-blocked nucleotide per cycle.

Because the array consists of distinct optically resolvableoligonucleotides, each target oligonucleotide will generate a series ofdistinct signals as the fluorescent events are detected. Details of thesequence are then determined and can be compared with known sequenceinformation to identify SNPs.

The number of cycles that can be achieved is governed principally by theyield of the deprotection cycle. If deprotection fails in one cycle, itis possible that later deprotection and continued incorporation ofnucleotides can be detected during the next cycle. Because thesequencing is performed at the single molecule level, the sequencing canbe carried out on different oligonucleotide sequences at one timewithout the necessity for separation of the different sample fragmentsprior to sequencing. This sequencing also avoids the phasing problemsassociated with prior art methods.

The labeled nucleotides can comprise a separate label and removableblocking group, as will be appreciated by those skilled in the art. Inthis context, it will usually be necessary to remove both the blockinggroup and the label prior to further incorporation.

Deprotection can be carried out by chemical, photochemical or enzymaticreactions. A similar, and equally applicable, sequencing method isdisclosed in EP-A-0640146. Other suitable sequencing procedures will beapparent to the skilled person.

The images and other information about the arrays, e.g. positionalinformation, etc. are processed by a computer program which can performimage processing to reduce noise and increase signal or contrast, as isknown in the art. The computer program can perform an optional alignmentbetween images and/or cycles, extract the single molecule data from theimages, correlate the data between images and cycles and specify the DNAsequence from the patterns of signal produced from the individualmolecules.

In a preferred embodiment of the invention, the ligated hairpinoligonucleotide is immobilised on a solid support surface at a densitythat allows each oligonucleotide to be individually resolved by opticalmeans, i.e. single molecule imaging. This means that, within theresolvable area of the particular imaging device used, there must be oneor more distinct signals each representing one duplex. Typically, thedetection of incorporated bases can be carried out using a singlemolecule fluorescence microscope equipped with a sensitive detector,e.g. a charge-coupled detector (CCD). Each duplex of the array may beanalysed simultaneously or, by scanning the array, a fast sequentialanalysis can be performed. Methods for the preparation of singlemolecule arrays and for single molecule imaging are described inWO-A-00/06770.

The term “individually resolved” is used herein to indicate that, whenvisualised, it is possible to distinguish one duplex on the array fromneighbouring duplexes. Visualisation may be effected by the use of thedetectably-labelled nucleotides as discussed above.

The density of the arrays is not critical. However, the presentinvention can make use of a high density of immobilised molecules, andthese are preferable. For example, arrays with a density of 10⁶ to 10⁹molecules per cm² may be used. Preferably, the density is at least10⁷/cm² and typically up to 10⁸/cm². These high density arrays are incontrast to other arrays which may be described in the art as “highdensity” but which are not necessarily as high and/or which do not allowsingle molecule resolution. On a given array, it is the number of singleoligonucleotides, rather than the number of features, that is important.The concentration of nucleic acid molecules applied to the support canbe adjusted in order to achieve the highest density of addressablesingle oligonucleotide molecules. At lower application concentrations,the resulting array will have a high proportion of addressable singleoligonucleotide molecules at a relatively low density per unit area Asthe concentration of nucleic acid molecules is increased, the density ofaddressable single oligonucleotide molecules will increase, but theproportion of single oligonucleotide molecules capable of beingaddressed will actually decrease. One skilled in the art will thereforerecognize that the highest density of addressable single oligonucleotidemolecules can be achieved on an array with a lower proportion orpercentage of single oligonucleotide molecules relative to an array witha high proportion of single oligonucleotide molecules but a lowerphysical density of those molecules.

Using the methods and apparatus of the present invention, it may bepossible to image at least 10⁷ or 10⁸ molecules. Fast sequential imagingmay be achieved using a scanning apparatus; shifting and transferbetween images may allow higher numbers of hairpin oligonucleotidemolecules to be imaged.

The extent of separation between the individual oligonucleotidemolecules on the array will be determined, in part, by the particulartechnique used for resolution. Apparatus used to image molecular arraysare known to those skilled in the art. For example, a confocal scanningmicroscope may be used to scan the surface of the array with a laser toimage directly a fluorophore incorporated on the individual molecule byfluorescence. Alternatively, a sensitive 2-D detector, such as acharge-coupled detector, can be used to provide a 2-D image representingthe individual oligonucleotide molecules on the array.

Resolving single molecules on the array with a 2-D detector can be doneif, at 100× magnification, adjacent oligonucleotide molecules areseparated by a distance of approximately at least 250 nm, preferably atleast 300 nm and more preferably at least 350 nm. It will be appreciatedthat these distances are dependent on magnification, and that othervalues can be determined accordingly, by one of ordinary skill in theart.

Other techniques such as scanning near-field optical microscopy (SNOM)are available which are capable of greater optical resolution, therebypermitting more dense arrays to be used. For example, using SNOM,adjacent oligonucleotide molecules may be separated by a distance ofless than 100 nm, e.g. 10 nm. For a description of scanning near-fieldoptical microscopy, see Moyer et al., Laser Focus World (1993) 29(10).

An additional technique that may be used is surface-specific totalinternal reflection fluorescence microscopy (TM); see, for example, Valeet al., Nature, (1996) 380: 451-453). Using this technique, it ispossible to achieve wide-field imaging (up to 100 μm×100 μm) with singlemolecule sensitivity. This may allow arrays of greater than 10⁷resolvable molecules per cm² to be used.

Additionally, the techniques of scanning tunnelling microscopy (Binniget al., Helvetica Physica Acta (1982) 55:726-735) and atomic forcemicroscopy (Hansma et al., Ann. Rev. Biophys. Biomol. Struct. (1994)23:115-139) are suitable for imaging the arrays of the presentinvention. Other devices which do not rely on microscopy may also beused, provided that they are capable of imaging within discrete areas ona solid support.

As aforementioned, the target nucleic acid molecules immobilised ontothe surface of the solid support should be capable of being resolved byoptical means. This means that, within the resolvable area of theparticular imaging device used, there must be one or more distinctsignals, each representing one polynucleotide. Thus, each molecule isindividually resolvable and detectable as a single molecule fluorescentpoint, and fluorescence from said single molecule fluorescent point alsoexhibits single step photobleaching.

Clusters of substantially identical molecules do not exhibit singlepoint photobleaching under standard operating conditions used todetect/analyze molecules on arrays. The intensity of a single moleculefluorescence spot is constant for an anticipated period of time afterwhich it disappears in a single step. In contrast, the intensity of afluorescence spot comprised of two or more molecules, for example,disappears in two or more distinct and observable steps, as appropriate.The intensity of a fluorescence spot arising from a cluster consistingof thousands of similar molecules, such as those present on the arraysconsisting of thousands of similar molecules at any given point, forexample, would disappear in a pattern consistent with an exponentialdecay. The exponential decay pattern reflects the progressive loss offluorescence by molecules present in the cluster and reveals that, overtime, fewer and fewer molecules in the spot retain their fluorescence.

The sequence information obtained from the polymerase reaction can becompared to a reference sequence to identify the SNPs. The referencesequence is any suitable sequence that represents the normal/generalgenome. Suitable reference genomes have been identified as part of thevarious genome sequencing efforts, for example the Human Genome Project.It is, strictly, only the defined base at the SNP site that is comparedwith the corresponding base on the reference sequence. The additionalsequenced bases in the unique sequence code are used to deconvolute theoligonucleotides and identify the relevant part of the referencesequence under study.

1. A method for determining the identity of one or more mutations orsingle nucleotide polymorphisms (SNPs) in a genome, comprising: a.contacting a sample genome, under conditions which permit templatedependent oligonucleotide ligation, with a plurality of differentoligonucleotide molecules which comprise (i) a first set of at least twooligonucleotides, each comprising a sequence of nucleotides that iscomplementary to a region on said genome that includes a known SNP site,wherein a nucleotide complementary to the known SNP site is at or nearthe 5′ end of each of said oligonucleotides and, each of saidoligonucleotides further comprises a unique label which is a uniquecoding sequence of nucleotides, wherein said unique coding sequence ofnucleotides is specific for the nucleotide complementary to the knownSNP site and the position of the SNP to be scored, (ii) a second set ofat least two oligonucleotides, each oligonucleotide comprising asequence of nucleotides complementary to a region on said target genomefor hybridisation with said target genome adjacent to the 5′ end of anoligonucleotide of said first set of at least two oligonucleotides, anda surface capture moiety, a phosphate moiety being located at any ofeither the 5′ end of said first set of at least two oligonucleotides orthe 3′ end of said second set of at least two oligonucleotides, whereinsaid contacting effects hybridization of the first and second set of atleast two oligonucleotides to the sample genome and generates ligatedoligonucleotides, b. immobilising the ligated oligonucleotides on asolid support via the surface capture moiety to generate immobilisedligated oligonucleotides, c. performing a sequencing reaction on theimmobilised ligated oligonucleotides to determine at least the uniquecoding sequence of nucleotides of one or more of said unique labels,wherein determining the unique coding sequence of nucleotides of aunique label identifies the nucleotide complementary to the known SNPsite and the position of the SNP to be scored, and comparing identifiednucleotides complementary to known SNP sites in any of said immobilisedoligonucleotides to those of one or more reference SNPs.
 2. The methodaccording to claim 1, wherein in step (a) each of said oligonucleotidesin said first oligonucleotide set includes one of any of the definednucleotide bases A, C, T or G for testing for complementarity with saidSNP.
 3. The method according to claim 1, wherein each of theoligonucleotides of said first oligonucleotide set includes a hairpinoligonucleotide.
 4. The method according to claim 1 wherein saidoligonucleotides are immobilised on said support at a density thatallows each immobilised oligonucleotide to be individually resolved byoptical microscopy.
 5. The method according to claim 1 wherein theligated product of said first and second sets of oligonucleotidescomprises between 10 and 70 bases.
 6. The method according to claim 1wherein the ligated product of said first and second sets ofoligonucleotide comprises from 30 to 50 bases.
 7. The method accordingto claim 1, wherein said method is performed for a plurality of SNPs. 8.The method according to claim 1 wherein said sample genomic DNA isfragmented prior to contacting with said sets of oligonucleotides. 9.The method according to claim 1, wherein said oligonucleotides arecontacted with said genome in the presence of a DNA ligase.
 10. Themethod according to claim 1, wherein said first and second sets ofoligonucleotides are contacted with said genome under conditions thatpermit non-enzymatic chemical ligation.
 11. The method according toclaim 10, wherein said oligonucleotides comprise 5′-iodide and3′-selenophosphate.